Mapping of Sequence Reads to the Reference Genomes    ◾    69

the quality string is not stored. Otherwise, it must be equal to the length of the sequence

in SEQ.

The alignment section of a SAM file may contain a number of optional fields. Each

optional field is defined by a standard tag accompanied with a data type and a value in the

following format:

TAG:TYPE:VALUE

The TAG is a two-character string. There are several predefined standard tags for SAM

optional fields. The complete list is available at “https://samtools.github.io/hts-specs/

SAMtags.pdf”. The user is allowed to add a new tag.

The TYPE is a single character defining the data type of the field. It can be “A” for the

character data type, “B” for general array, “f” for real number, “H” for hexadecimal array,

“i” for integer, and “Z” for string.

VALUE is the value of the field defined by the tag data type.

Notice that the last four columns in the SAM file shown in Figure 2.16 are for optional

fields identified by the four predefined standard tags: “NH”, “HI”, “AS”, and “NM”. The

“NH” tag shows the number of reported alignments (number of hits) that contain the read

TABLE 2.4  CIGAR Operations and Descriptions

Operation

Description

M

Alignment match, which can be a sequence match or mismatch

I

Insertion to the reference sequence

D

Deletion from the reference sequence

N

Skipped region from the reference sequence

S

Soft clip on the read (present in SEQ)

H

Hard clip on the read (not present in SEQ)

P

Padding (silent deletion from the padded reference sequence)

=

Sequence match

X

Sequence mismatch

TABLE 2.3  The FLAG Bitwise Decimal and Hexadecimal Numbers and Their Descriptions

Decimal

Hexadecimal

Description of Read

1

0x1

The read is paired

2

0x2

The aligner mapped the two pairs properly

4

0x4

The read is unmapped

8

0x8

Next segment in the template is unmapped

16

0x10

The sequence in SEQ is a reverse strand (minus strand)

32

0x20

The next sequence (SEQ) is a reverse strand

64

0x40

First read in paired reads

128

0x80

Second read in paired reads

256

0x100

The alignment is secondary

512

0x200

The read fails platform/vendor quality checks

1024

0x400

The read is PCR or optical duplicate (technical sequence)

2048

0x800

The alignment is supplementary